hardness measure
What makes math problems hard for reinforcement learning: a case study
Using a long-standing conjecture from combinatorial group theory, we explore, from multiple perspectives, the challenges of finding rare instances carrying disproportionately high rewards. Based on lessons learned in the context defined by the Andrews-Curtis conjecture, we analyze how reinforcement learning agents handle problems of varying hardness. We also address many mathematical questions as a part of our study. Notably, we demonstrate the length reducibility of all but two presentations in the Akbulut-Kirby series (1981), and resolve various potential counterexamples in the Miller-Schupp series (1991), including three infinite subfamilies.
How hard is my MDP?" The distribution-norm to the rescue"
In Reinforcement Learning (RL), state-of-the-art algorithms require a large number of samples per state-action pair to estimate the transition kernel $p$. In many problems, a good approximation of $p$ is not needed. For instance, if from one state-action pair $(s,a)$, one can only transit to states with the same value, learning $p(\cdot|s,a)$ accurately is irrelevant (only its support matters). This paper aims at capturing such behavior by defining a novel hardness measure for Markov Decision Processes (MDPs) we call the {\em distribution-norm}. The distribution-norm w.r.t.~a measure $\nu$ is defined on zero $\nu$-mean functions $f$ by the standard variation of $f$ with respect to $\nu$. We first provide a concentration inequality for the dual of the distribution-norm. This allows us to replace the generic but loose $||\cdot||_1$ concentration inequalities used in most previous analysis of RL algorithms, to benefit from this new hardness measure. We then show that several common RL benchmarks have low hardness when measured using the new norm. The distribution-norm captures finer properties than the number of states or the diameter and can be used to assess the difficulty of MDPs.
How hard is my MDP?" The distribution-norm to the rescue"
Odalric-Ambrym Maillard, Timothy A. Mann, Shie Mannor
In Reinforcement Learning (RL), state-of-the-art algorithms require a large number of samples per state-action pair to estimate the transition kernel p. In many problems, a good approximation of p is not needed. For instance, if from one state-action pair (s, a), one can only transit to states with the same value, learning p( |s, a) accurately is irrelevant (only its support matters). This paper aims at capturing such behavior by defining a novel hardness measure for Markov Decision Processes (MDPs) based on what we call the distribution-norm. The distributionnorm w.r.t. a measure ν is defined on zero ν-mean functions f by the standard variation of f with respect to ν. We first provide a concentration inequality for the dual of the distribution-norm.
built from n i.i.d samples, we control ||p p
In Reinforcement Learning (RL), state-of-the-art algorithms require a large number of samples per state-action pair to estimate the transition kernel p. In many problems, a good approximation of p is not needed. For instance, if from one state-action pair (s, a), one can only transit to states with the same value, learning p( |s, a) accurately is irrelevant (only its support matters). This paper aims at capturing such behavior by defining a novel hardness measure for Markov Decision Processes (MDPs) based on what we call the distribution-norm. The distributionnorm w.r.t. a measure ν is defined on zero ν-mean functions f by the standard variation of f with respect to ν. We first provide a concentration inequality for the dual of the distribution-norm.
The Unreasonable Effectiveness of Easy Training Data for Hard Tasks
Hase, Peter, Bansal, Mohit, Clark, Peter, Wiegreffe, Sarah
How can we train models to perform well on hard test data when hard training data is by definition difficult to label correctly? This question has been termed the scalable oversight problem and has drawn increasing attention as language models have continually improved. In this paper, we present the surprising conclusion that current language models often generalize relatively well from easy to hard data, even performing as well as "oracle" models trained on hard data. We demonstrate this kind of easy-to-hard generalization using simple training methods like in-context learning, linear classifier heads, and QLoRA for seven different measures of datapoint hardness, including six empirically diverse human hardness measures (like grade level) and one model-based measure (loss-based). Furthermore, we show that even if one cares most about model performance on hard data, it can be better to collect and train on easy data rather than hard data, since hard data is generally noisier and costlier to collect. Our experiments use open models up to 70b in size and four publicly available question-answering datasets with questions ranging in difficulty from 3rd grade science questions to college level STEM questions and general-knowledge trivia. We conclude that easy-to-hard generalization in LMs is surprisingly strong for the tasks studied, suggesting the scalable oversight problem may be easier than previously thought. Our code is available at https://github.com/allenai/easy-to-hard-generalization
Characterizing instance hardness in classification and regression problems
Torquette, Gustavo P., Nunes, Victor S., Paiva, Pedro Y. A., Neto, Lourenço B. C., Lorena, Ana C.
Some recent pieces of work in the Machine Learning (ML) literature have demonstrated the usefulness of assessing which observations are hardest to have their label predicted accurately. By identifying such instances, one may inspect whether they have any quality issues that should be addressed. Learning strategies based on the difficulty level of the observations can also be devised. This paper presents a set of meta-features that aim at characterizing which instances of a dataset are hardest to have their label predicted accurately and why they are so, aka instance hardness measures. Both classification and regression problems are considered. Synthetic datasets with different levels of complexity are built and analyzed. A Python package containing all implementations is also provided.
PyHard – A tool to assess dataset quality and identify hard-to-classify instances in general - DataScienceCentral.com
Picture Courtesy: Freepik The article explains the algorithm behind the recently introduced Python package named PyHard, based on the concept of Instance Space Analysis. It helps in assessing the quality of a dataset and identifying what are the instances which are hard/easy to classify. With the help of this algorithm we can separate out noisy… Read More »PyHard – A tool to assess dataset quality and identify hard-to-classify instances in general
How hard is my MDP?" The distribution-norm to the rescue"
Maillard, Odalric-Ambrym, Mann, Timothy A., Mannor, Shie
In Reinforcement Learning (RL), state-of-the-art algorithms require a large number of samples per state-action pair to estimate the transition kernel $p$. In many problems, a good approximation of $p$ is not needed. For instance, if from one state-action pair $(s,a)$, one can only transit to states with the same value, learning $p(\cdot s,a)$ accurately is irrelevant (only its support matters). This paper aims at capturing such behavior by defining a novel hardness measure for Markov Decision Processes (MDPs) we call the {\em distribution-norm}. The distribution-norm w.r.t. a measure $ u$ is defined on zero $ u$-mean functions $f$ by the standard variation of $f$ with respect to $ u$.